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Abstract 

Statistical fluctuations in population sizes of microbes may be quite large depending on 
the nature of their underlying stochastic dynamics. For example, the variance of the 
population size of a microbe undergoing a pure birth process with unlimited resources 
is proportional to the square of its mean. We refer to such large fluctuations, with the 
variance growing as square of the mean, as Giant Number Fluctuations (GNF). Luria and 
Delbriick showed that spontaneous mutation processes in microbial populations exhibit 
GNF. We explore whether GNF can arise in other microbial ecologies. We study certain 
simple ecological models evolving via stochastic processes: (i) bi-directional mutation, (ii) 
lysis- lysogeny of bacteria by bacteriophage, and (iii) horizontal gene transfer (HGT). For 
the case of bi-directional mutation process, we show analytically exactly that the GNF 
relationship holds at large times. For the ecological model of bacteria undergoing lysis or 
lysogeny under viral infection, we show that if the viral population can be experimentally 
manipulated to stay quasi-stationary, the process of lysogeny maps essentially to one-way 
mutation process and hence the GNF property of the lysogens follows. Finally, we show 
that even the process of HGT may map to the mutation process at large times, and 
thereby exhibits GNF. 

Keywords: Population dynamics, Stochastic growth, Mutation, Lysis-lysogeny, 
Horizontal gene transfer 



1. Introduction 

The presence of huge statistical fluctuations in microbial populations was first shown 
in a simple biological process, namely the birth or autocatalytic process that involves the 
binary fission of a cell into two genetically identical daughters. The full mathematical so- 
lution of the stochastic process with the exa ct distribu t ion of the population number, and 
in particular the variance, was calculated bv lDelbruckl(|l940h . In such a process the mean 



of the population number n(t) grows exponentially with time t as: (n(t)) = n(0) exp(/3i), 
where /3 is the birth rate, and the variance Var[n(t)] = (n(t)) 2 / n(0) — (n(t)), which grows 
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as square of the mean at large times. Due to such huge fluctuations this process stands in 
direct contrast with the familiar Poisson process which has Variance = mean. In this pa- 
per, we refer to the statistically large fluctuations characterized by Variance ~ (mean) 1 ' 
with v = 2, as 'giant number fluctuations' (GNF). This term has b een appeared previ- 



ously in other contexts like spatial fluctuations of bacterial numbers (jZhang et al 



20091 



2010), where v > 1 has been regarded as a signature of large fluctuations than usual. 
Here, since we use the pure birth process as a reference case to compare with other 
ecological models, by GNF we specifically mean that the power v = 2. The GNF prop- 
erty continues to hold even for a more comp licated process, name l y spo ntaneous forward 
mutation, as rev ealed by the classic work o f iLuria and Delbruck" (Il943l) . 

The work of Luria and Delbruck ( 1943h (henceforth LD) in which a bacterial pop- 
ulation in a nutrient-rich medium was allowed to grow for a while, and then subjected 
to an one-time attack of a lethal virus, was important in many ways. Their work pro- 
vided evidence for sp ontaneous random mutations in the DNA sequence of the genome 



([Alberts et all 120021 ) a nd also initiated a pract ical experimental approach to calcu- 
late the mutation rate ( Rosche and Foster! l2000h . A key result of their work was the 
demonstration of GNF in the number x(t) of the mutant bacteria at large times - 
Var[z;(t)] ~ (x(t)) 2 . It is important to note that GNF in the LD ecological model does 
not trivially follow from the results of a pure birth process. The mutant number in 
an LD ecological model and the bacterial number in a pu re birth process have distinct 
distribution functions — the former has a power law tail (lMandelbortl.ll974HMa et al ' 



19921 ; iQi Zhend . Il999h . while the latter has an exponential tail (jPelbruckl . Il940l) . Mor 



re- 
over the GNF property holds at large times, and not at early times, in both the LD 
and the birth processes. The LD model involves three processes altogether, namely the 
birth process of the wildt ype bacteria, that of its mutant, and the mutation process. 
Luria and Delbruck (Il943l) treated the tw o birth processes determ inistically while the 



mutations were treated as random events. Lea and Coulson ( 1949f) im proved the latter 
by also treating the birth of mutants stochastically. Finally Bartlett ( Armitaeel . 1952L 
Qi Zhend . ll999L pg. 23) obtained the most realistic calculations by treating all the three 
processes as stochastic. While all three formulations yield different distribution func- 
tions, the GNF property holds asymptotically for all of them. The LD result motivates 
us to explore the prevalence of the GNF property in other microbial ecologies. 

In a general microbial ecology, where complex inter-microbe interactions are involved, 
one may ask: what is the relationship between the variance and the mean of a particular 
species? In this paper we study the asymptotic population dynamics of three progres- 
sively more complex ecological models (as compared to the original LD case) namely (i) 
a model involving bi-directional mutation process, (ii) a particular limit of infection of 
bacteria by bacteriophage and (iii) two simple models of horizontal gene transfer. 

In the bi-directional mutation, bacteria not only mutate by a forward mutatio n like 
the L D case, but the mutant can convert back to the original wildtype bacteria ( Liebl . 
19511 ). Such reverse mutations are well known and in fact form the basis of the Ames 



test which is one of t he standard tests for determi ning the mutagenicity of chemicals 
( McCann et al. , 1975 ; Mortelmans and Zeieer , 2000h . We show analytically exactly that 
the asymptotic distribution of population sizes of both the wildtype and the mutant 
obeys the GNF relationship. 

In nature, temperate bacteriophage s (for example, phag e A which infects E. Coli ) 
can follow either one of two life cycles ( Sneppen and Zocchi . 2005 ; Court et"aH . 120071) . 
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In the lytic life cycle, the phage that infects the host cell multiplies through replication 
and ultimately bursts out by killing the bacterium. In the lysogenic life cycl e the phage 
comb i nes its genome wi t h that of the bacterium to f orm a stab l e lyso gen (jKourilskv , 



19731: IWeitz et all . 120081: IWang and Goldenfeldl . l2010i: IZeng et all l201dh . Interestingly, 



we show that if experimentally the phage population size can be controlled to stay quasi- 
constant, this three-species system reduces to an one-way mutation process involving 
two species. We numerically demonstrate this LD like behavior and the associated GNF 
property for our proposed protocol. 

Finally we study ecological models involving horizontal gene transfer (HGT). HGT 
occurs due to exchange of DNA segments of a genome between different strains of bac- 
teria, and can take place by means of three known mechanisms, namely transformation, 
transduction, or conjugation. The discovery of HGT as an important inter-microbe 
interaction over the past f ew decades has led to a resurgence of interest in micro- 
bial evolutionary dynamics (|Ochman et al. , bOOfi IChen et all 120051 iBabic et all 120081: 
Goldenfeld and WoeseL l2007h . Here we have found that ecological models involving bac- 



terial genotypes undergoing mutual HGT, may sometimes map to mutation processes at 
large times. We numerically demonstrate the appearance of GNF in two- and four-species 
HGT models. 

In all the ecological models we study in this paper, we assume a nutrient-rich environ- 
ment (like the experimental conditions of Luria and Delbriick) , so that unrestricted birth 
is possible for each population for indefinite time. Although in a natural environment 
the exponential phase (often also called the "logarithmic phase" ) of the bacterial growth 
may approach saturation due to overcrowding and limited food resources, in a controlled 
laboratory experiment one can replenish nutrients to prolong the exponential phase (see 
Liebl . 119511) . The curious reader may wonder whether the GNF property holds beyond 
the exponential growth phase or not. We will discuss below the case of a birth process 
with "limited resources" and deaths, and show that fluctuations arc ordinary Poisson-like 
as saturation is reached. 

Experiments that study microbial populations in the presence of unknown interactions 
among the different genotypes may therefore look for GNF. The prevalence of the GNF 
property in the systems that we study, suggests, that it is likely to be found often in the 
laboratory. 



2. Birth and bi-directional mutation processes 

2.1. Birth process 

The process by which a bacterial species multiplies by successive division is stochastic. 

In the presence of unlimited resources it is represented as: TV A TV + TV , where (3 is the 
birth rate. The probability P„(t) = Prob.[n(t) = n] of the random population size n(t) 
at any time t is governed by the Master quation: 

BP (t) 

= /3(n-l)P n _i(t)-/3nP„(i) (1) 



The exact distribution P n (t) was solved bv lDelbriick ( 1940l ) and that leads to: Var[n] = 



n} 2 /n(0) — (n). At large times, Far[n] ~ (n) 2 , which is the GNF property. 
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Figure 1: The number fluctuations in the exponential phase and saturation phase of the logistic birth 
model arc compared, (a) Log-log plot of Var[n] ~ (n) 2 in the exponential phase, with (n) ~ exp((/3 — 8)t) 
(see inset), and (b) log- log plot Var[n] ~ (n) in the saturation phase, with (n) ~ K (see inset). Note 
f) = 0.02, 8 = 0.01, n(0) = 10, and for the two insets K = 50000. Every data point (circles) of both (a) 
and (b) is obtained by averaging over 10 5 independent histories. The data points in (a) are obtained at 
different chosen time t for a fixed K = 50000, while those in (b) are obtained at different chosen values 
of K for a fixed time t = 3000 which is deep inside the saturation (see inset). 



In the presence of limited resources the bacterial population size is expected to sat- 
urate. Does the GNF property hold in the saturation phase? A standard model of 
microbial growth under nutrient-limiting conditions is the logistic growth model, where 
population numbers eventually saturate due to limited resour c es. W e studied a stochas- 
tic version of the logistic growth model (|Tan and Piantadosil 1 1 9 9 lh represented by the 
following processes: 



Ri 

R.2 



N > N ■ 



N 



N 







(2) 



Here, the parameter K is called the 'carrying capacity' and represents the maximal 
population that can be supported by the available resources. Note that the birth rate 
j3 = (3(1 — n/K) is not constant. The inclusion of death (reaction i?2 in Eq. @) is 
important — without it the evolution in the saturation phase would stall as n(t) — > K 
(and (3 — > 0). The Master equation of this process is non-linear in population size: 



dP n (t) 

dt 



/3(1 - (n - 1)/K)(n - l)P n -i(t) + S(n + l)P n+1 (t) 
-(/3(l-n/K) + 8)nP n (t) 



(3) 



We simulate Eq. [2] using kinetic Monte-Carlo (jBortz et al. iGillespid . 
20071 ) and the results for V^ar[n] = (n 2 ) — (n) 2 in the exponential and saturation phases 
of the system, are shown in Fig. 1(a) and 1(b), respectively. While (a) shows GNF, (b) 
shows l^ar[n] ~ (n). In this model, a small proportion of the stochastic histories lead to 
extinctions of the population ( ~ 0.1% for K = 5000 for example). We obtain yar[n] 
by doing a history average over cultures which do not become extinct, starting from a 
fixed n(0). The result in Fig. 1(b) shows that the giant fluctuations which arise in the 
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Figure 2: The schematic diagrams of the ecological models. The double arrows going up from a popula- 
tion (marked by fa or fa) represent the birth processes, (a) Bi-directional mutation, (b) Lysis-Lysogeny. 
(c) The LD limit of the Lysis-Lysogeny model with constant viral population, (d) Two species HGT. 
(e) Visualization of the two species HGT mechanism via a shared gene pool. 



exponential growth phase, diminish and become ordinary Poisson-like in the saturation 
regime. 



2.2. Bi-directional mutation 

In this section we calculate the moments of the population sizes of species involved 
in a bi-directional muta t ion process. We follo w the method of Bertlett (pg. 34-40 
Armitaeel . Il952t iBertlettl Il978l iQi Zhend . Il999i pg. 23) in which the mutation process 
is assumed to produce a mutant and a normal cell from a normal cell (formulation (B) 
of D. G. Kendall, (see Qi Zheng . 19991 pg. 4)). In nature there exists a pos sibility of a n 
occasional reverse mutation that takes the microbe to the original genotype (jLiebl . 1 1 9 5 lh , 
thereby diminishing the population of the mutant pool. The studies on spontaneous 
forward and reverse mutation of different bacterial genotypes h ave revea l ed that the 
latter two rates may differ by a factor of 10 — 100 in magnitude ( Bunting . 1 19461 : iLiebl . 
19511 ). Let the wildtype population be denoted by N (with a population size n(t)). The 
mutant population X (with a population size x(t)) can convert back to the wildtype 
population N (see Fig. 2a) via the reaction channel R4 in Eq. ((4]), which makes the 
problem two-way coupled. The reactions are: 



i?! : 


N 


4 


N - 




i? 2 : 


X 




X - 


-X 


R3 : 


N 




N - 


hi 


i?4 : 


X 




N - 


-X. 



(4) 



Here /3i and j3 2 are the bare birth rates of the wildtype and mutant population respec- 
tively, and fi\ and \i 2 are the forward and the reverse mutation rates respectively. 

We begin by considering the joint probability distribution P n , x (t) = Prob.[n(t) = 
n;x(t) = x]. From the rates given in Eq. the master equation is as follows: 

OP (t) 

d f ' = Pi{n- l)P„-i,x(0 + fo{x - l)P„ )X -i(t) + fiinP n ,x-i(t) + ^xPn-iAt) 
-{Pin + /3 2 x + nin + H2x)P n ,x( t ) ( 5 ) 

The above equation can be solved analytically exactly for its moments using the 
generating function technique (see |Appcndix A I . We find that the means of both N and 



X populations follow the same time development at asymptotically large times (see Eq. 
IA.4I in Appendix A): 



(n) ~ (x) ~ e Qlt , where, a x = [(ft + (3 2 ) + ^ - /3 2 ) 2 + 4/i lA i 2 ]/2 (6) 

Thus the bi-directional coupling makes the wildtype and mutant track each other in 
perfect synchrony. The effective growth rate a>i of the two species is a complicated 
combination of all the rates defining the ecological model. Turning to the fluctuations (see 
Eq. IA.4l in Appendix A), we find that mathematically the largest exponent contributing 
to the variance is such that at large t: 

Var[n] - Var[x] - e 2ait . (7) 

Thus the GNF property holds: X^ar[n] ~ (n) 2 and l/ar[x] ~ (x) 2 . We note that this 
GNF property holds as long as there is no saturation. 



3. LD limit of Lysis-Lysogeny 



The prolonged co-cvolution of temperate phage with a bacterial population can be 
modeled as a stochastic process wherein virus -infected bacteria either die by lysis, or 



trans form into another genotype via lysogeny (pneppen and Zocchil . 12005b iCourt et al 
20071 ). A large number of virus particles are born during lysis. The classic host-phage 



system of the bact eria E. Coli and phage A exhibits the processes of lysis and lysogeny 



( Zeng et al. . 2010l) . This ecology can be modeled by the following reactions (see Fig 



2b): 



Pi 
P 2 

P 3 

i?4 



N ^ N + N 
N 



V — > aV 



0(1-/) 
N + V > X 

X % X + X 



(8) 



Here N, V and X denote the sensitive bacteria, the virus particles, and the lysogenic 
bacteria that is immune to the virus respectively, and n(t), v(t), and x(t) are their 
respective population sizes. The sensitive bacteria grows at rate f3±. The viral infection 
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rate, i.e. the rate of a V interacting with an N, is <j>. After infection, the probability 
of lysis is / and that of lysogeny is (1 — /). On lysis, the bacterium dies and the 
number of viruses released is a (the average viral burst size) . On lysogeny, a stable new 
lysogen is formed which grows at a distinct rate fc- We assume that only one virus 
infects a bacterium at a time, i.e. the multiplicity of infection (MOI) is one. Recent 
expe riments hav e show n that the probability of lysogeny is « 0.2 when the MOI equals 
one (IZeng et all l2010h. though older experiments had indicated it to be negligible at 
this MOI ( KourilskyT 1973 ). One can write the Master equation for the joint probability 
P n ,x,v(t) = Prob.[n(t) = n; x(t) = x; v(t) = v] describing the ecological model in Eq.© 
(see Eq. (|B.ip in Appendix B); but because of it having nonlinear terms, unlike Eqs. (JT|) 
and (JSJ in Sec. it is hard to tackle analytically. We note that in the above we are 
ignoring the possibilities of delay of infection to lysis, prophage induction and curing of 
lysogens, and superi nfection. 

It is well known (jWeitz and Dushofj . liool IWang and Goldenfeldl . l2010l ) that without 
a finite carrying capacity of bacteria due to limited resources, a model like in Eq. © 
makes all the sensitive bacteria N go extinct. The reason is that a is typically large (say 
> 50), and in a well mixed medium, the viral population increases very fast, and infects 
and kills all the sensitive bacteria, leaving the lysogens alive which continue to replicate 
on their own. Apart from finite carrying capacity, another way to sustain this model 
ecology is to make the viral population size v(t) constant by some means. This cannot 
be done in an exact sense, because that would require viral bursts to stop. But this 
may be done in an approximate way by a plausible experimental protocol that involves 
replacing part of the culture media by fresh media (see Appendix C). 

If the viral population is kept fixed, under the condition v(t) = constant, the model 
defined by Eq. (JSJ reduces to a two-specics model (sec Fig. 2c) represented by 



Ri 
R 2 

i?4 



N^N- 



N 



N > X 



X^X 



X, 



(9) 



where, 4> = 4>v — constant is a new rate. The above reactions represent an LD-likc 
ecological model with a Master equation given by Eq. (|B.2p . The reaction R3 in Eq. © 
is like an uni-dircctional mutation reaction with a mutation rate \i = <j> (1 — /). Yet 
the mutation process N — > X is distinct f rom the process N N + X used in Bcrlctt's 
formulation of mutation (jBertlett . 19781 : Qi Zhengl . 1999[ pg. 23) — they ha ve been 
referred to as formulations (A) and (B) of D. G. Kendall (see Qi Zheng . 19991 pg. 4) 
respectively. Apart from this distinction from the LD ecological model, another point of 
difference is the death of the bacteria N (reaction i?2)- The latter leads to an effective 
growth rate j3 1 = (/3i — <j> ) of N. We present the explicit results of the moments of 
population sizes for Eq. (|9]) in Eqs. 



(|B.3I) and (|B.4|) . The results differ from those of 
the original LD problem only upto the constant prcfactors, which do not affect the time 
dependences. The means and the variances are like LD at large times. Thus for j3i > P2, 

(n) — (x) — e' 3 !* and ^ar[n] — Far[a;] — e 2 ^*, while for (3 2 > P\, (n) ~ eJ 3 ^, (x) ~ 

and T^ar[n] ~ e 2 ^ 1 *, yar[x] ~ e 2 ^ 2 '. Thus in all cases, the GNF property holds at large 
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Figure 3: Linear-log plot of n(t), x(t) and v(t) versus time t for a single stochastic history. The fitted 
exponential functions (solid continuous straight lines) are: (n) ~ 100 X exp(/3 1 t) (see section 3) and 
(x) ~ 7 X exp(/?2t)- At the bottom we have Var[x]/ (x) 2 which is essentially constant between t = 
200 — 400, demonstrating the GNF property — for Var[x] and (x), averages over 10 5 independent 
histories were done. The data arc for @i = 0.02, /3 2 = 0.023, = 0.00003, / = 0.8, a = 50, T = 0.001, 
n(0) = v(0) = 100 and x(0) = 0. 



times, i.e. Var[n] ~ (n) 2 and Var[a;] ~ (x) 2 , for the lysis-lysogeny model if v(t) is held 
constant. 

Wc suggest a way to experimentally hold th e viral populatio n v(t) quasi-stationary 
(see Appendix C) by "diluting out" the viruses (jKourilskv . after equal time inter- 

vals T. We implement the latter protocol numerically by resetting the viral population 
number to its initial value vq after each interval T. In the meantime over T, we time- 
evolve the population numbers according to t he original Lysis- Lysogeny re a ctions in Eq . 
© using the kinetic Monte-Carlo algorithm (jBortz et all Il975t iGillespiej . Il976l l2007t) . 



Thus over a coarse time scale that is greater than T, the viral population v(t) is quasi- 
stationary. We thus find a time window within which this ecological model follows the 
LD limit. 

We see in Fig. 3 that the data for the instantaneous population sizes n(t) and x(t) of 
the bacteria and the lysogens, match reasonably well with the growth curves of the means 
((n) and (x)) predicted for the LD like problem of Eq. In particular, the fluctuations 
of the lysogens follow the GNF property as shown in Fig. 3 for large times (within the 
window t = 200 — 400), although not at small times. The time window over which the 
system behaves like LD may be made larger by choosing a smaller interval T. Note that 
the window is eventually terminated by a sharp catastrophic rise in the viral number as 
can be seen around t ~ 450 in Fig. 3 (for a discussion see Appendix C). 

In summary, we have proposed an experimental protocol which can make a stochasti- 
cally evolving lysis-lysogeny ecology appear as an LD ecology, over a certain time window. 
As a result, the GNF property of the LD model carries over to this special case of lysis- 
lysogeny. This GNF property holds only when there is no finite carrying capacity for the 
populations. 



4. Horizontal gene transfer 



It is k nown that HGT produces a high degree of genomic s imilarity in the interacting 
microbes ( Ochman et al. , [20001 Goldenfeld and Woesd . 2007t ) . In the spirit of a model 



studied recently for HGT for four genotypes within a biofilm (jChia et all 120081) we have 
constructed a simpler model of HGT involving two bacterial genotypes, N and X, with 
birth rates fa and fa respectively. Population sizes are denoted by n(t) and x(t). It is 
assumed that ./V and X are closely related and they differ at the level of only one gene — 
N has a gene uii and X has a different gene while the rest of the genome is identical 
for both. One may think of N and X as the populations of a normal bacterium and an 
antibiotic resistant variant. The antibiotic resistant bacterium has a mutation in one gene 
and it can share th e mutated gene by HGT and t hereb y confer antibiotic resistance to the 
normal bacterium ( Akiba et al. ( 196Clh ; Barlowl (|2009l) ) . Similarly the normal bacterium 
can also share its copy of the gene and make the antibiotic resistant bacterium revert 
back to the normal state. The genomic matters of N and X are exchanged via HGT, 
transforming one genotype completely into other. This ecological model (see Fig. 2d) is 
described by the following reactions : 



Ri 
R 2 

Rs 

Ra 



N ^\ N + N 
X^X + X 

~fix/(n+x) 

N > X 

x > N 



(10) 



In the above, the processes of HGT are viewed as two-step processes. The genes are 
exchanged via a communally shared gene-pool (see Fig. 2e). Firstly, a particular species 
can get a suitable gene from the shared gene pool with a certain probability, which is 
assumed to be equal to the fraction of its contributor. Thus a gene ioi contibuted by X is 
acquired with probability x/(n+x), and u>± contributed by N is acquired with probability 
n/(n + x). Secondly, the newly acquired gene is pasted (or overwritten) upon the former 
one with rates 71 and 72. Thus the N genome gets transformed into a X genome at an 
effective rate of ^\x/(n + x) (i? 3 in Eq. (fTO)) ). while -f2Ti/(n + x) is the effective rate at 
which X genome gets transformed into a N genome (R4 in Eq. (fTU|) '); these effective rates 
are no longer constants like in the LD model and signify that the processes are nonlinear. 
Since, HGT is a relatively rare event in reality, we further assume {fa, fa} » {71,72} 
in the subsequent discussion. 

One can write the Master equation for the joint probability P n ,x{t) = Prob.[n(t) = 
n; x(t) = x] for the HGT model in Eq. [TO] as below: 



dP n ,x(t) 

at 



fa(n - l)P„- M (i) + fa(x - l)P n , a _ 1 (t) 

+j 1 [(n + l){x-l)/{n + x)]P n+ltX - 1 {t) 
+72[(n - l)(x + l)/(n + x)]P„- M+ i(t) 
-(fan + fax + (71 + 7 2 )nx/(n + x))P n , x (t) 



(11) 



Note that the above Master equation has nonlinear terms unlike Eqs. ([T]) and |5]) in 
Sec. [5] and is difficult to tackle analytically. One cannot write a system of ODEs for the 
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Figure 4: (a) Linear-log plot of (n) and (x) versus time t for two-species HGT. The data from kinetic 
Monte-Carlo simulation arc shown in symbols. The fitted exponential functions (solid continuous straight 
lines) are: (n) ~ 12.0 X exp((/3i —71 +72)*) and (x) ~ 12.0 X exp^t) (see Sec. (b) Linear-log plot of 
Var[n]/ (n) 2 and Var[x]/(x) 2 versus time t for two-species HGT. The straight lines show that the data 
is constant at large times. The data in both the curves are for n(0) = x(0) = 10, /3i = 0.02, P2 = 0.03, 
71 = 0.005 and 72 = 0.001. All data arc obtained by averaging over 10 5 independent histories. 



cumulants in a closed form, and in fact, following similar steps as in Appendix A, every 
ODE for a cumulant contains the next higher order cumulant. In spite of this difficulty, 
we will show that the unique algebraic form of non-linearities in the HGT model makes 
its asymptotic large time limit tractable. At large times, both the populations (N and 
X) are very large and one of them dominates over the other. 

We will discuss one case in detail, namely when X dominates over N (i.e. x » n in 
Eq. (fTTj)'). which occurs for /?2 > Pi at asymptotically large times. Under this condition, 
[x ± + x) ~ x/(n + x) w 1 in Eq. (jllj) . and one can get an approximate linear 

Master equation: 



/81 (n - l)P„- 1)B (t) + P2& - l)P BiX _ x (t) + 7l (n + l)P n+lia ._i(t) 
+ 72 (n - l)P n _ liB+1 (t) - (p x n + /3 2 x + ( 7l + 7a)n)P ni *(*) (12) 



at 



Note that the above Eq. (|12|) is similar to the Master equation of the LD like ecological 
model discussed in Appendix B (see Eq. (|B.2[0 — they are not exactly identical as the 
third and fourth terms in the two equations are different. Starting from Eq. (|12[) we have 
derived the ODEs of the cumulants and compared them with the Eq. (|B.3p in Appendix B 
— they are very similar, but not identical. After making the identifications: j\ — 72 = fJ. , 
Pi ~ (71 ~ 72) = P\ and 71 +72 = 4> , wc find that the new ODEs of K1.0, Ko,i an d ^2,0 i n 
the asymptotic HGT model are same as those of Eq. (|B.3[) . but ODEs of k^i and Ko,2 
have minor differences. The latter differences do not change the asymptotic temporal 
behavior. Solving the ODEs, at large times we get: (n) ~ e^ 1- ^ 71 l ' 2 ^ t , (x) ~ e^ 2 ' and 
yar[n] ~ e 2 (^i-(7i-72))^ V'arfa;] ~ e 2/3 ' 2t for the parameter regime P2 > Pi >> 7i > 72- 
Results for other parameter regimes, as well as the opposite case of n(t) » x(t) will not 
be discussed, as they are easy to derive following similar steps as above. 

Instead of an approximate analysis for large times as above, one may simulate the 
original exact non- linear model of the HGT process (Eq. (fTtHO using kinetic Monte-Carlo 
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for all times. We show our data in Fig. 4 — we find a match with the LD like behavior 
(as predicted from our approximate analysis above) for the means (Fig. 4a). At short 
times GNF is not seen (Fig. 4b), although the means grow exponentially at those times 
(see Fig. 4a). As time gets larger, the variances exhibit the GNF property (Fig. 4b) 
for both TV and X populations. Thus we arrive at a conclusion — the two species HGT 
process approximates a mutation process at large times, due to the algebric uniqueness 
of its non-linear interaction and has giant fluctuations in its population sizes. Does this 
conclusion extend to the HGT process involving more than two sp ecies? To answer 
this, we investigate a HGT model involving four genotypes studied bv lChia et al.l (|2008l ) 
and confirm that the GNF property holds in this model (see the detailed discussion in 
Appendix D). 

For multiple, i.e. s (with s > 2) number of species undergoing HGT, the Master 
equation in general will involve non-linear terms proportional to (X)I=i a i n i/ 2i=i ^i n i) x 
Tij, where r < s, m is the population size of the i th species, and {cii, hi] are constants. At 
large times such non-linear terms may reduce to approximate linear forms for a certain 
suitable choice of the parameters. This linearization would map the original non-linear 
processes of HGT to linear mutation processes. As a consequence, an LD like behavior 
with giant fluctuations may show up in certain parameter regimes of the HGT model 
with multiple species. 



5. Discussion 

In this paper, we have studied the property of giant number fluctuation (GNF) in 
a few interesting microbial ecological models. While this is well-known for the models 
of the birth process and of uni-directional mutation, we have shown that it also holds 
for bi-directional mutation, and may be seen under suitable conditions in ecologies of 
microbes evolving via HGT and Lysis-Lysogcny. In all the ecological models that we 
have studied, the GNF property appears to arise because the populations effectively 
obey linear Master equations in population sizes. Yet it is interesting to identify the 
regimes in these interacting models where the stochastic dynamics resembles the "LD- 
like" mutation process. 

We have shown that whenever an ecological model effectively reduces to a LD-like 
ecological model at the level of the Master equation, the GNF property follows. In the 
Lysis-Lysogeny model, given the non-linear terms in the Master equation, making a pre- 
diction about GNF becomes difficult. Yet in the special limit when the viral population 
may be controlled to be quasi-stationary, we see that this process has fluctuation prop- 
erties like a mutation process. Similarly for the HGT model, the non-linear terms of the 
Master equation sometimes (and for two-species HGT always) are of an algebraic form 
that at late times approach linear limiting forms. This again implies an LD like stochastic 
dynamics with an associated GNF property. The results for the saturation phase of the 
stochastic logistic birth process, suggest that exponential growth is necessary for seeing 
GNF. Whether a linearized Master equation involving multiple species would always lead 
to GNF, remains an open mathematical question to explore in future. 

Rece ntly, a simila r large fluctu ation property has been observed in active matter 
systems (jZhang et all 120091 l20ld) . where the non-equilibrium dynamics of swarming 



bacteria produces local spatial number fluctuations within a fixed volu me, which are 



non-Poissonian. We would like to comment that in such experiments (| Zhang et al 
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2010h . if the measurements are done over time periods greater than the division time 



scale (say ~ 30 min) of the bacteria, the sources of large fluctuations would be twofold. 
The first source of this large fluctuation would be non-equilibrium collective dynamics, 
while the second would be the stochastic microbial population growth itself (as pointed 
out in this paper) . Thus the current work may be of relevance for quantitative studies of 
microbial systems executing active motion over long times. Apart from that, the GNF 
in microbial ecologies is not just a mathematical curiosity, but has practical significance. 
What constitutes an asymptotic regime in practice depends upon the system — it is 
achieved when the time t is much larger than the inverse of the largest growth rate 
in the system. For bacteria with a high dividing rate, this may be reached in only 
a few hours or a day of culture, and the GNF therefore is of practical experimental 
importance. Luria and Dclbriick's work gave rise to fluctuation assays, where mutation 
rates are estimat ed by experimentally dete rmining the distribu tion of mutants in multipl e 
parallel cultures ( Rosche and Foster . 2000l) . As pointed out in Rosche and Foster (2000). 
alternate methods based on estimation of the mutant fraction in bacterial cultures are 
very unreliable due to the presence of GNF. Measurements of quantities that are related 
to total population numbers of cells in exponential growth are susceptible to such large 
variances, and more careful analysis of fluctuations may be warranted. Since we have now 
shown that GNF may arise in other interesting ecological models, more careful analysis of 
the population sizes in these ecologies need to be done by theorists and experimentalists 
in future. 
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Appendix A. Details of the calculations for bi-directional mutation 

Staring from Eq. §5$ in Sec. 12.21 we introduce the probability generating function 
(p.g.f.) G(zi, Z2,t) = X^ g =o z i z 2^p,q(^)i ana - fi 11 ^ its linear partial differential equation 
(PDE) as below: 



~8t = llz[^ lZl ( Zl _ ^ + ^ l2l ( Z2 ~~ + ^~[^2^2(^2 - 1) + ^2Z2(2l - 1)] 

(A.l) 

Next the cumulant generating function is used, which is defined as K(9i,02,t) = 
ln[G(zi = e 9l ,Z2 = e 92 ,t)] = ( e eiP+92q ). Using this definition we can transform the 
PDE for G(zi,Z2,t) into a PDE for K{9i,02,t) and then substitute an expansion for K, 
wherein 

00 

k= £ mei)/(i\ji)] Ki!j (t). (a.2) 

i+j>l 
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Here K^j{t) are the cumulants of the joint probability distribution. In particular the 
first few cumulants are given by ki.o = (n), fto.i = {%), K 2.o = Var[n], Ko,2 = Var[x] 
and Ki t i = (nx) — (n)(x) . Equating the coefficients of ft, ft, O162, ft 2 and Q\, we obtain 
a system of coupled linear ordinary differential equations (ODEs) for the cumulants as 
below: 

K l,o( t ) = Pl K l,0 + ("2^0,1 
K 0,l(*) = ^2^0,1+^1^1,0 

K l,l(0 = (A + /?2)«1,1 + Ml K 2,0 + ^0,2 
K 2,o(*) = ^1«1,0 + 2j8lK2,0 + 2/i2«l,l 

«o,2(*) = (/^2 +M2)k ,i + 2/3 2 ko,2 + Mi K i,o + 2^iki,i (A. 3) 

In above equation, primes denote the ordinary time-derivatives. Note that the linear 
PDE of p.g.f. G ultimately leads to linear cumulant equations, which is a property of a 
linear process like the process considered here. Moreover, by putting ^2 = in Eqs. (|A. 1[) 
and (fA~3j) . we get ba ck the correspond ing equations in the original Bertlett's formulation 
of the LD model (see Qi Zhene . 19991 pg. 23). Solving the system of ODEs given by Eq. 



A.3j) we obtain the means and variances of the populations as follows: 



(n) = A ie ait + A 2 e a2t 

(x) = Bie ait + B 2 e a2t 

Var[n] = (fte 2ait + C 2 e 2a2t + C 3 e ait + C A e a2t + C s e^ 1+Pa ^ 

Var[x] = C[e 2ait + C 2 e 2a2t + C 3 e ait + C\e a2t + C' 5 e i01+Mt 



where, a x = [(ft + ft) + ^(ft - ft) 2 + 4^ lA i 2 ]/2, 

a 2 = [(ft + ft) - VWi ~ P2) 2 + W 2 ]/2. (A.4) 

Here A\ 1 ■ ■ ■ , B2 and (ft, • • ■ , C 5 are constants involving the parameters ft , ft, A*i, /i2, ftO) 
and x(0). Since in Eq. (|A.4|1 . a\ is always greater than ct%, we arrive at the asymptotic 
expressions of the means in Eq. ^ of Sec. 12.21 Again, noting that the rate 2a± domi- 
nates over the other rates a±, «2, 2a 2 and (ft + ft), appearing in Eq. (|A.4[) . we get the 
asymptotic results of the variances in Eq. ([7]) of Sec. 



Appendix B. Details of the calculations for the Lysis-Lysogeny model 

For the Lysis-Lysogeny model defined in Eq. @, the non- linear Master equation is: 

dPn ^ {t) = P l {n-l)P n _ l . x . v {t)+P2{x-l)Pn. X -lAt) 

at 

+4>f{n + l)(v -a + l)P n +i, x .v-( a -i)(t) 
+(/)(! - f)(n + l)(v + l)P n+1>x _ hv+1 (t) 
-{/3m + fox + <f>nv)P w (t) (B.l) 
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Under the assumption of v = constant the ecological model of Eq. ([8]) exactly reduces to 
the model of Eq. (|9|) and the corresponding Master equation becomes linear as follows: 



dP (t) 

n Z ' = / 8i(n-l)P n _i )S (t)+ i 82(x-l)P„, x _ 1 (t) + 0/(n+l)P„ + i, x (*) 
at 

+4>'(1 - f){n + l)P n+1 . x ^(t) - (fan + fax + (f>'n)P n . x (t), (B.2) 

where <p = 4> v — constant. Following similar steps as in Appendix A, from the above 
equation we get the ODEs for the cumulants as below: 

K i,o(*) = Px^ifi 

K 0,l W = ^2«0,1 + M «1,0 

= (Pi + fh)Kl,l + M K 2,0 - M «1,0 

«2,o(*) = (A + ^ ) K 1,0 + 201^2,0 

K o,2(0 = #2«o,i + 2/3 2 k ,2 + A* «i.,o + 2/x (B.3) 

Here P 1 = (fa — 4> ) and /i = (1 — /). We recognize that fa is analogous to the birth 
rate of the wildtype, while /x is analogous to the mutation rate in the original LD model. 
The explicit results for the moments are: 

(n) = n(0)e^* 

(a?) = x (0) e ^ + y jL 'n(0)/^' 1 -p 2 )}[e^ t -e^ t ] 

Var[n] = [(fa + <f>' )n(0) / fa^e 2 ^* - A*] 

Var[x] = D 1 e 2 & + D 2 e 2 ^ + D 3 e& + D 4 e 02t + D B e^ + ^* (B.4) 

where the coefficients D§ are constants involving the parameters fa, fa, A* j 4* , n (0) 

and cc(0). 

Appendix C. A proposed experimental protocol to map a lysogeny process 
to a mutation process 

Our motivation for the reduction of a Lysis-Lysoge ny model to a mu tation-like eco- 



logical model comes from the experimental procedure of lKourilskvl (|1973f ). In this exper- 
iment, the reinfection of cells by phages produced in lysis was prevented by using two 
different broths. At first, sensitive cells were allowed to grow in tryptone maltose broth, 
where most of the infection events took place. Then, these infected cells were transferred 
into Hershey citrate broth where phage-adsorption was severely reduced, and the phages 
were diluted out. This leads us to suppose that the removal of unabsorbed free phages by 
repeatedly diluting the cultures may be possible experimentally, and this may keep the V 
population more or less constant. Based on this, we suggest that if one can periodically 
remove the free phages after equal time intervals T, then the viral population v(t) may 
become quasi-stationary. We show in Sec. [3] that it works at least numerically. 

The curious reader may note that the length of the time-window over which the LD 
like behavior is noticeable depends on the choice of T. The time window terminates by 
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an avalanche in viral number and a sharp decrease of the population of the sensitive 
cells which eventually go to extinction. The reason behind this phenomenon can be 
summarized as follows. 

In kinetic Monte-Carlo the average time gap (r) between any two successive reactions 
described in Eq. @ should be much greater than the chosen time T. Otherwise too 
many reactions will occur within T which can violate the condition v(t) « constant. 
But as (t) = l/((/?i + <j))n + (3 2 x) is actually a decreasing function of time, at some 
point T >> (t) and frequent lytic bursts guarantee a catastrophic proliferation in viral 
number. Thus a choice of small enough T is important. 



Appendix D. Description of the model of four-species HGT 



The original model of IChia et al.1 (|2008l) was treated at a mean- field (deterministic) 
level, and birth of species were ignored. In contrast, here we allow unrestricted birth of 
each species in the presence of unlimited resources, and treat the four-species HGT as 
a stochastic process. In this model the genomes have an active locus gene and another 
storage locus gene, either of which can be or 1. Thus four possible genotype populations 
can arise: Nqo, Noi, Nio and Nu, where the first index in the subscript denotes the active 
gene and the second index denotes the storage gene. Let n o(t), n i(t), n w {t) and nu(t) 
denote their population sizes respectively. These populations interact with each other via 
a shared gene pool. The probability of getting a gene is po = (2rtoo + n-oi + n\o) /2n to t, 
where n to t = noo + noi + "10 + nu. Conversely, the probability of getting a 1 gene is 
pi = 1 — pq. After having a gene i (i = or 1) from the shared gene pool, this gene 
can recombine with the active locus with probability qi, while it can recombine with the 
inactive storage locus with probability (1 — qi). The full stochastic processes along with 
the births of the four species can be described by the following reactions: 



JZi : 


Noo 


% N 00 + Noo 


R 2 ■ 


Noi 


% Noi + A^oi 


R 3 ■ 


N w 


H Nio + N w 


i?4 : 


Nu 


H N n + N n 


R 5 ■■ 


Noo 


rpi(l-gi) 

> Noi 


i?6 : 


Noo 


rpiqi 

> N 10 


R 7 : 


Noi 


rpo(l-go) 

>■ Noo 


R$ . 


Noi 


rpiqi 

>Nn 


Rg ■ 


N w 


rpoqo 

> Noo 


Rio ■ 


N w 


rpi(i-qi) 
> Nn 


Rn ■ 


Nu 


rpoqo 

► A^oi 


Rl2 ■ 


Nu 


rp (l-q ) 

> Nw 



(D.l) 
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Figure D.5: (a) Linear-log plot of (noo), (n-oi), (fiio) ancl ( n li) versus time t for four-species HGT. Data 
from kinetic Monte-Carlo simulation are shown in various symbols as indicated. The fitted exponential 
functions (solid continuous straight lines) are: (noo) ~ 10-0 x exp(Aif); (nrji) ~ 10.5 X exp(A2t); (nio) ~ 
8.6 X exp(A2t) and (nu) ~ 12.0 X exp(A4t) (see Eq. JD.2I) in Appendix C). (b) Linear-log plot of 
V ar[noo] / {noo) 2 , Var[noi]/(noi) 2 , Var[nio]/(nio) 2 and V ar[nn] / '(nu) 2 versus time t for four-species 
HGT. The straight lines show that the data are constant with time. The data are for noo(0) = nrjl(0) = 
nio(0) = nn(0) = 10, /3i = 0.03, /3 2 = /3 3 = 0.02, /3 4 = 0.01, r = 0.0005, go = 0.9 and qi = 0.1. The 
data are obtained by averaging over 10 5 independent histories. 



The reactions R\ to i?4 are linear birth processes and the reactions R§ to i?i 2 represent 
the nonlinear interactions through HGT. The rate of HGT in this model is r and we will 
also assume that {0\, 02, 03, 04} >> r. The Master equation is non-linear (in population 
sizes) in general. However at large times, the nonlinear terms may be approximated 
to linear ones for the choice of parameters: noo >> {noi, nio, nn}, noo >> n^, 0\ > 
{02, 03, 04}, and < qi < 1/2, 1/2 < < (where i = or 1). From the approximate 
linear Master equation we get the ODEs for the moments and solving them get the 
following solutions for the means: 



"00) 


rj Cl e Alt + c 2 e A2t 


+ c 3 e A3t + c 4 e A 


it 


"01) 


« k ie X2t + fc 2 e A3t 


+ fc 3 e A4t 




nio) 


k-^G " ~\~ k^ c ^ 


+ k' 3 e X4t 






m ke Xi \ 








where 






Ai 


= Pi , A 2 =(^ 


2+03 r\ 
2 4/ 


Vb 


A3 


(02 + 03 r N 


| - Vb , A 4 = 


04 -r 




V 2 4, 






b 


= r 2 +4(/? 2 -/? 3 


-r + 2rq o )(02 


~0 3 + 2r(q Q 



1)) (D.2) 

Here ci, • • • ,k are constants. Note that A 2 > A3 always, but nothing can be concluded 
about the relative magnitudes of A 2 and A4, which will depend on the exact parameter 
values. Thus two cases can arise at large times: (i) for A 2 > A4, we have (noo) ~ e Al ', 
(noi) ~ (nio) ~ e Xlt and (nn) ~ e A4 '; while (ii) for A 2 < A4, we get (n o) ~ e Alt and 
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(noi) ~ (mo) ~ (nn) ~ e Xit . 

We exactly simulate the reactions of Eq. (|D.1[) using kinetic Monte- Carlo. In certain 
regions of parameter space, at large times, we do find LD like behavior with the associated 
GNF property. We show the data for means and variances at large times in Fig. 5, for 
one such parameter and number regime defined by the following relations: (i) 0i > 
{02,03,0a}, (ii) < q. t < 1/2 and 1/2 < q x ^ < 0, where i = or 1, (iii) n 00 » 
{noi, nxo, Tin}, and (iv) hqq >> n\ x . The GNF property holds for each species as is 
clearly evident from Fig. 5b. We also find a match between our data and the approximate 
analytical answer (Fig. 5a). We note that unlike the two-species HGT, for the four- 
species HGT the reduction to LD like behavior may not be true in general for all regions 
of the parameter space. 

Similar analysis as above holds for two other parameter and number regimes : (1) 
04 > {0i,02, fo} , nn » {"oi, nio, n 00 } and n 00 » n 2 al ; (2) mi ~ not ~ n w ~ n 00 . 
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